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Abstract —We consider distributed storage (DS) for a wireless 
network where mobile devices arrive and depart according to 
a Poisson random process. Content is stored in a number 
of mobile devices, using an erasure correcting code. When 
requesting a piece of content, a user retrieves the content from 
the mobile devices using device-to-device communication or, if 
not possible, from the base station (BS), at the expense of a 
higher communication cost. We consider the repair problem when 
a device that stores data leaves the network. In particular, we 
introduce a repair scheduling where repair is performed (from 
storage devices or the BS) periodically. We derive analytical 
expressions for the overall communication cost of repair and 
download as a function of the repair interval. We illustrate 
the analysis by giving results for maximum distance separable 
codes and regenerating codes. Onr resnits indicate that DS 
can reduce the overall communication cost with respect to the 
case where content is only downloaded from the BS, provided 
that repairs are performed frequently enough. The required 
repair frequency depends on the code used for storage and 
the network parameters. In particular, minimum bandwidth 
regenerating codes require frequent repairs, while maximum 
distance separable codes give better performance if repair is 
performed less frequently. We also show that instantaneous repair 
is not always optimal. 

I. Introduction 

It is predicted that global mobile data traffic will reach 
24.3 exabytes per month by 2019, nearly a tenfold increase 
compared to the traffic in 2014 ||T]. This dramatic increase 
in mobile data traffic threatens to completely congest the 
already burdened wireless networks. One popular approach 
to reduce peak traffic is to store popular data closer to the 
end users, a technique also known as caching. Recently, a 
novel architecture was proposed to efficiently handle highly 
predictable bulky traffic, such as video traffic The idea 
is to deploy a number of access points (called helpers) with 
large storage capacity, but low-rate wireless backhaul, and 
store data across them. Users can then download content from 
the helpers, resulting in a performance gain. 

In g it was suggested to store content directly in the 
mobile devices, taking advantage of the high storage capacity 
of modern smart phones and tablets. Hence, no additional 
infrastructure is required. Traffic to the BS is alleviated by 
maximizing the number of times a requested file can be re¬ 
trieved from the mobile devices storing content, using device- 
to-device (D2D) communication. The problem of repairing the 
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lost data when a device leaves the network was considered 
in 0^ where data is stored in the mobile devices using 
erasure correcting coding. In particular, the communication 
cost incurred by data download and repair is analyzed in 0, 
assuming an infinite storage capacity in the mobile devices 
and instantaneous repair. 

In this paper, we consider distributed storage (DS) in a wire¬ 
less network scenario similar to the one in Q. We consider a 
cellular system where mobile devices roam in and out of a cell 
according to a Poisson random process and request content 
at random times. The cell is served by a base station (BS), 
which always has access to the content. Content is also stored 
across a limited number of mobile devices using an erasure 
correcting code. When a user requests a piece of content, it 
attempts to download it from the mobile devices using D2D 
communication. If not possible, the content is downloaded 
from the BS, at the expense of a higher communication cost. 
Our main focus is on the repair problem when a device that 
stores data leaves the network. In particular, we introduce a 
repair scheduling where lost content is repaired (from storage 
devices sojourning in the cell or from the BS) at periodic times. 
We derive analytical expressions for the total communication 
cost of repair and download as a function of the repair interval. 
Eurthermore, we analyze several erasure correcting codes, 
namely maximum distance separable (MDS), and regenerating 
codes. We show that DS can reduce the overall communication 
cost as compared to the classical scenario where content 
is only downloaded from the BS, provided that repairs are 
performed frequently enough. The required frequency depends 
on the code family and on the network parameters. Somewhat 
surprisingly, instantaneous repair is not always the optimal. 

II. System Model 

We consider a single cell in a cellular network, served by 
a BS, where mobile devices (referred to as nodes) arrive and 
depart according to a Poisson process. The average number of 
nodes in the network is N. Nodes wish to download content 
from the network. Eor simplicity, we assume that there is a 
single object (file), of size M bits, stored at the BS. We further 
assume that nodes can store data and communicate between 
them using D2D communication. The considered scenario is 
depicted in Pig. [T] 

Arrival-departure model. Nodes arrive according to a Pois¬ 
son process with exponential independent, identically dis¬ 
tributed (i.i.d.) random inter-arrival times Ta with probability 



Figure 1. A wireless network with data storage in the mobile devices (nodes). 
A new node arrives to the network at rate NX. The departure rate per node is 
fi. Blue nodes store exactly a bits each. The green node requests the file and 
downloads it from the storage nodes (solid an'ows), or from the BS (dashed 
arrow). The repair of a node (in red) is carried out by transmitting 7 d 2D bits 
from storage nodes (solid an'ows) or 7 bs bits from the BS (dashed arrow). 

density function (pdf) 

fTM = NXe-^^\ t>0, (1) 

where NX is the expected arrival rate of a node and f G M is 
time, measured in time units (t.u.). 

The nodes stay in the cell for an i.i.d. exponential random 
lifetime T\ with pdf 

hit) = f > 0, (2) 

where fj, is the expected departure rate of a node. The number 
of nodes in the cell can be described by an M/M/oo queuing 
model. We assume that fj, = X, i.e., the average number of 
nodes in the cell stays constant (equal to N). 

Data storage. The file is partitioned into k packets and 
encoded using an [n, k) erasure correcting code of rate R — 
k/n. The encoded data is stored in n nodes, referred to as 
storage nodes. For simplicity, we assume n N, hence the 
probability that the number of nodes in the cell is smaller than 
n is negligibly small. Therefore, the file can always be stored 
in the network. In particular, each storage node stores exactly 
a bits, i.e., we consider a symmetric allocation ||^. Hence, 


Like Q, we also introduce an overall storage budget constraint 
of FM bits, F > 1, across the nodes in the cell, i.e., na < 
TM. Note that to satisfy this constraint, R > l/F. 

Data delivery. Nodes request the file at random times with 
i.i.d. random inter-request time R with pdf 

/T,(f) = t > 0, (4) 

where w is the expected request rate per node. Whenever pos¬ 
sible, the file is downloaded from the storage nodes using D2D 
communication, referred to as D2D download. In particular, 
we assume that data can be downloaded from any subset of 
h G {k,... ,n} storage nodes. In other words, D2D download 
is possible if h or more storage nodes remain in the cell. In 


this case, the amount of downloaded data is ha > M bits, 
where the inequality follows because h > k. The parameter h 
depends on the properties of the erasure correcting code used 
for storage, and will be discussed in Section In the case 
where there are less than h storage nodes in the cell, the file 
is downloaded from the BS, referred to as BS download. In 
this case, M bits are downloaded. To simplify the analysis 
in Section ||n] we assume that the download bandwidth is the 
same irrespective of whether the request comes from a storage 
node itself or not. This is a reasonable approximation, since 
n N. 

We assume that transmission from the BS and from a node 
(in D2D communication) have different costs. We denote by 
Pbs Pd 2 d the cost (in cost units (c.u.) per bit, [c.u./bit]) of 
transmitting one bit from the BS and from a node, respectively, 
and by p = Pbs/pd 2 d its ratio. We further assume p > 1, 
hence transmission from the BS is at least as costly as the 
transmission in D2D communication. 

A. Repair Process 

When a storage node leaves the network, its stored data is 
lost (see blue node with orange stripes in Fig. [T]). Therefore, 
another node needs to be populated with data to maintain the 
initial state of reliability of the DS network, i.e., n storage 
nodes. The restore (repair) of the lost data onto another node, 
chosen uniformly at random from all nodes in the cell that 
do not store any content, will be referred to as the repair 
process. In particular, we introduce a scheduled repair scheme 
where the repair process is launched periodically. We denote 
the interval between two repairs by A (in t.u.), A > 0. Note 
that A = 0 corresponds to the case of instantaneous repair, 
considered in Q. 

Similarly to the download, repair can be accomplished from 
the storage nodes (D2D repair) or from the BS (BS repair), 
with cost per bit pd 2 d and pBs, respectively. The amount of 
data (in bits) that needs to be retrieved from the network 
to repair a single failed node is referred to as the repair 
bandwidth, 7. In particular, we assume that D2D repair can 
be performed from any subset of r G {k,... ,n — 1} storage 
nodes by retrieving /3 < a bits from each node. In other words, 
D2D repair is possible if there are at least r storage nodes in 
the cell at the moment of repair, r is usually referred to as 
the repair access in the literature. In this case 7020 = trP, 
where the subindex indicates that repair is performed from 
the storage nodes. If there are less than r storage nodes in 
the network at the moment of repair, then the repair is carried 
out by the BS. In this case 7 bs = a. We assume that repair 
always succeeds. Furthermore, for both repair and download 
we assume error-free transmission. 

III. Repair and Delivery Cost 

In this section, we derive analytical expressions for the 
repair cost, E (Cr), download cost, E (Ca), and total cost 
E(C') = E (Cr) + E (Cd), as a function of the repair interval, 
A. The cost is defined in cost units per bit and time unit 
[c.u./(bitxt.u.)] 




A. Average Repair Cost 


Denote by and the average number of nodes 

repaired from the storage nodes and from the BS, respec¬ 
tively, in one repair interval. Also, let {bi{n,p)}2^Q be the 
probability mass function (pmf) of the binomial distribution 
with parameters n and p. 


Pi = ip, for i G {h, ..., n}, and Pi = e Then 
E(C'd) = (pBsMPr{BS down.} -|- pD2DftaPr{D2D down.}) 

n 

-n 

j=h 
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where Pr{BS download} -f Pr{D2D download} = 1. 
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where p = e 

Proof: As the inter-departure times are exponentially 
distributed, the probability that a storage node has not left 
the network during a time A and is accessible for repair is 
p — e~^^. Hence, the probability that i storage nodes are 
accessible is bi{n,p). If only i storage nodes remain in the 
network, then n — i repairs need to be performed. D2D repair 
is performed if z > r; BS repair is performed otherwise. 
Therefore, 0 and Q hold. ■ 

The average repair cost, E (Cr), is given in the following 
theorem. 

Theorem 1. Consider the DS network in Section with 
parameters M, A, pns, Tbs, Pd2d, 7d2d, p, n and r. The 
average repair cost is 

E (Cr) = (pbS7BS?T-?* + PD2D7D2D»2P’^) (7) 

\ i^O 

n 

+Pd2d7d2d - i)h{n,p) 

i—r 



where p = e 

Proof: From the system model, it follows that the cost 
of repairing a single storage node from the BS is Pbs 7 bs c.u. 
Similarly, the cost of D2D repair of a single node is Pd 2 d 7 d 2 d 
C.U.. Normalizing by the file size (M bits) and the duration of 
the repair interval A, we obtain 0 in [c.u./bitxt.u.]. Finally, 
using Lemma we obtain 0- ■ 


B. Average Download Cost 

The average download cost is given in the following theo¬ 
rem. 

Theorem 2. Consider the DS network in Section 1^1 with 
parameters N, ui, M, pes, Pd 2 d, n, h, a, p and A. Let 


Proof: See appendix. ■ 

C. Average Total Cost 

Combining Theorems [T] and one obtains the expression 
for £((7) = E (Cr) -I- £((71]). Note that in general E((7) is 
not monotone with A. We can derive the following result for 
A —>■ 0 and A —>■ oo. 


Corollary 1. limA-).oE(C') = ^^{np'y’D2'D +Nujha). More¬ 
over, for p > 0, limA-foo E((7) = NujpBs- 

For instantaneous repair (A = 0), both repair and download 
are always performed from the storage nodes. Thus, the two 
terms in £((7) for A —>^ 0 in Corollary correspond to the 
repair and download costs in the D2D regime. For A —?> oo, 
data is never repaired (hence, E (C}) = 0). For p > 0, the 
number of storage nodes in the cell will become smaller than 
h at some point, and D2D download is not possible. Therefore, 
the average download cost is the average BS download cost. 


IV. MDS AND Regenerating Codes 


From Section III it can be seen that the total cost, £((7), 
depends on the DS system parameters n, h, r, 7 d 2 d = rj3, 
and 7 bs = a (among others). This section describes how, in 
turn, these parameters depend on the (n, k) erasure correcting 
codes used for storage. We consider as examples MDS codes 
and regenerating codes |j^. 


A. Maximum Distance Separable Codes 

Assume the use of an (n, k) MDS code for DS. Then, due to 
the MDS property, D2D repair and D2D download require to 
contact r = h = k storage nodes. Moreover, /3 mds = Q^mds = 
which means that 7020 = M. The fact that an amount 
of information equal to the size of the entire file has to be 
retrieved to repair a single storage node is a known drawback 
of MDS codes 0- 

The simplest MDS code is the n-replication scheme. In this 
case, each storage node stores the entire file, i.e., cirep = M. 
For the replication scheme, r = h = 1 and /3rep = M. 


B. Regenerating Codes 

A lower repair bandwidth 7 d 2 d (as compared to MDS codes) 
can be obtained by using regenerating codes 0, but at the 
expense of increasing r 0. Two main classes of regenerating 
codes are covered here, minimum storage regenerating (MSR) 
codes and minimum bandwidth regenerating (MBR) codes. 
For given n and k, MSR codes yield the best storage efficiency, 
i.e., (Tmsr is minimum, while MBR codes achieve minimum 
D2D repair bandwidth, i.e., 7020 is minimum. 








fj,A 


Figure 2. Normalized total cost K{C)/Nljp versus the normalized repair 
interval fiA for MDS, MSR, and MBR codes. 



Figure 3. ^Amax as a function of the cost ratio p. 

For an (n, k) MSR code in a DS system, h = k. Moreover, 
r G {k,...,n — 1} storage nodes are contacted during the 
D2D repair process. Hence, the download cost E (Ca) for an 
(n, k) MSR code is equal to the one of an (n, k) MDS code. 
However, /3msr = X r-ft+i ^ /^mds Q- 7d2d = ?’/3msr is 
minimized for r = n — 1. For r = k, the total cost E(C) of 
the MSR code is equal to that of the MDS code. 

As described in |[^, an (n, k) MBR code in a DS system 
has r G {h,... ,n - 1} and 7020 = t’/Smbr = X 2 r-^+i - 
Furthermore, 7020 = ^mbr = x 0- where the last equality 
comes from The relationship between k, h and r is 
therefore k = 

2r 

V. Numerical results 

In this section, we evaluate the total cost E(C') for MDS and 
regenerating codes. For the results, we consider a network with 
N = 100 average nodes, request rate w = 0.5, and a cost ratio 
p = 200. Also, the storage budget is set to F = 5. Without 
loss of generality we set pd 2 d = 1 c.u./bit, i.e., p = pBS- To 
specify a code, we use the alternative notation [n,h,r\. 

Fig. shows the value of the normalized cost E(C)/Va;p 
versus the normalized repair interval pA for p = 50, for the 
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Eigure 4. Normalized total cost E(C) /Nujp for the same codes as in Eig.[^ 
but for much smaller normalized repair intervals /lA. 


[10, 2, 2] MDS code, the [10, 2, r] MSR code with r G {5, 9}, 
i.e., moderate and high repair access respectively, the [10, 3, r] 
MBR code with r G {5, 9} and the 5-replication scheme. The 
code rate for all codes is ii = 1/5, except for the [10,3,5] 
MBR code that has R = 6/25 = 0.24 and the [10, 3, 9] MBR 
code that has ii = 4/15 « 0.27. In the figure, pA = 1 means 
that the repair interval is equal to one average node lifetime. 

The code parameters are chosen to highlight particularly 
interesting behaviors of the different codes. Note that since a, 
f3 (and hence 7020 ) and 735 are proportional to the file size 
M, as specified in Section jlVj the repair and download cost 
in 0 and 0, respectively, are independent of the file size 
M. From Corollary E(C') /Nujp —>• 1 (the cost of always 
downloading content from the BS) when A — 00. We observe 
from Fig. |^that this is indeed the case. It is interesting to point 
out that the normalized total cost exceeds 1 for values of the 
repair interval larger than a threshold An,ax- We define the 
maximum repair interval as 

A^ax = sup {a : £((7) < ^lim £((7)} . (10) 

For A > Ajnax, retrieving the file from the BS is always less 
costly, therefore storing data in the nodes is useless. Clearly, 
Amax is a function of the cost ratio p. Fig. [^ shows pA^^ax 
as a function of p G [1,200], for all codes in Fig. We 
observe that if p < 5, approximately, it is never beneficial 
to use the devices for storage, i.e., the file should always be 
downloaded from the BS. As p increases, storing data in the 
mobile devices is beneficial, if repair is performed with A < 
Amax- The regenerating codes with high repair access require 
very frequent repairs. Although not included here due to space 
constraints, the same is true for other MSR and MBR codes 
with high repair access. The MDS codes and the regenerating 
codes with moderate repair access require less frequent repairs; 
for large p, the repair interval must be at most around 1.5 and 
0.5 average node lifetimes respectively. 

For the same parameters and codes used in Fig. [^ Fig. [^ 
shows the normalized total cost for shorter repair intervals. We 
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Figure 5. The number of available storage nodes vs. time t, within the repair 
interval A. At t = 0, there are n nodes available. During the intervals Ti, 
there are i nodes. Hence, during the time interval t S [0,5;,) there are at 
least h nodes available for D2D download. 


observe that instantaneous repair is optimal for the MBR and 
MSR codes with r = 9 (Fig. Qa)). On the other hand, £((7) 
for the MDS codes and the regenerating codes with moderate 
repair access is minimized for A > 0 (Fig. j^b)). 

VI. Conclusions 

We considered distributed storage for a wireless network 
where data is stored in a distributed manner across mobile 
devices. We introduced a repair scheduling where the repair of 
the data lost due to device departures is performed periodically. 
We derived analytical expressions for the total communication 
cost, due to repair and download, as a function of the repair 
interval. For a particular network, we showed that there exists a 
maximum value of the repair interval after which retrieving the 
file from the BS is always less costly. Therefore, DS is useful if 
the repair can be performed frequently enough. Instantaneous 
repair is not always the best solution. The optimal repair 
interval that minimizes the total communication cost depends 
on the code used for storage. For a given repair interval, one 
should find the code that minimizes the total communication 
cost. A more thorough investigation is left for future research. 


Appendix A 

Outline of the Proof of Theorem[2] 

A hie request entails a cost pmoha with probability 
Pr{D2D download}, and a cost PbsM with probability 
Pr{BS download}. The overall request rate per t.u. is Noj. 
Normalizing by the hie size M gives the hrst equality in (|^. 
In the following, we prove the last equality of the theorem. 

Within a repair interval, the number of storage nodes n{t) 
in the cell is described by a Poisson death process Q. Denote 
by Ti the time interval for which n{t) = i, i G {h,... ,n} 
(see Fig.j^for illustration). Ti is exponentially distributed with 
rate pi = ip. Denote by Sh the time instant within the repair 
interval at which n{t) changes from h to h — 1. Then, 


Sh = Y.T,. 

i—h 

The pdf of Sh is given by Q 

PnPn-l ■ ■ ■ Ph -jijt + > n 

nuip.-p.) ’ - 


/Sh(0 = X! 

i—h 


( 11 ) 


( 12 ) 


We are interested in the distribution of hie requests within a 
repair interval A. Let Wi be the time instant of the (th request. 
Wi is computed as the sum of I inter-request times with pdf 
given by Q. Thus, Wi is an Erlang distributed random variable 
with pdf ||^ 


fwi (t) 


Ig Ult 


t > 0. 


(13) 


Dehne Wi = Wi mod A. The following result holds. 


Lemma 2. The distribution of Wi for t G [0, A) is 


fwi (^)— 




i=0 


((-!)! 


Lemma 3. limi^oo fwi (^) = s ■ 


The proofs are omitted due to lack of space. It can be 
verihed numerically that f^^{t) converges to the uniform 
distribution already for small values of 1. 

D2D download is possible if at least h storage nodes are 
available in the network. Thus, given the sequence of random 
variables {Wi, 1 ^ 2 , • • ■}, 


Pr{D2D download} = lim — y^Pr(lPi < Sh) 

L->-oo L ^' 

/=! 

« Pr(VFoo < 


where the approximation follows because for large enough I, 

/Wi ~ s ■ 

Now, using ([T^, after some calculations we obtain 


Pr{D2D download} = — ^ 


i—h 


1 ^ 

1 TT dj 


(14) 


Finally, using and Pr{BS download} = 1 — 

Pr{D2D download} we obtain (|^. This completes the proof. 
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